And the state of this point here results from its own and from that of the neighbors.
This is a classic stencil operation.
Yes, and it is now filled with meat accordingly.
And if it has the coordinates I, J, then, yes, it results here, for example, at the time T at the point IJ.
That is then T plus 1, that is the temperature at the point IJ at the time T and plus the neighbors are all weighted
and weighted by the factor 1 quarter in 1, 2, 3, 4.
And that applies to the time T and then we have each of the four neighbors in a right-angled neighborhood.
So, exactly. So that would be, for example, such a stencil.
Yes, exactly, and then I'm counting here now, well, if it's all flow comma operations,
in this case 0.25 is easy again, but if we take the flow comma operation at 0.25 is a quarter,
I could also generate that by two shifts to the right, but I now calculate the flow comma operation with it,
then I would have 1, 2, 3, 4, 5 flow comma operations.
I have to get all the data together once, 1, 2, 3, 4, 5.
Let's say that would be all float variables, so 4 bytes, then I would have 20 bytes,
20 bytes for bytes for 5 float variables.
And I'm practicing exactly how many flow comma operations out of 5.
1, 2, 3, 4, 5, 4 additions and I'm calculating them now, I'm not doing that over a shift,
then that's 5 flow comma operations, that means 5 by 20 and that's my arithmetic intensity.
Wait a minute, stop, I'm calculating the view.
I'm calculating the flops per byte, so on top are the flops, 20, I have 5 flops exactly and 20 bytes,
so that's a quarter flop per byte.
And accordingly it lands on the x-axis here, we can no longer see it, but here it would be a quarter.
And that is now the arithmetic intensity, I can get it out of the arithmetic,
from the description of my algorithm, from the description of my kernel.
And this would normally run in a loop, a loop where I iterate over t
and calculate different states at a certain time.
So, by the way, I should just say that this is representative for one point here in my grid system,
the operations run on all grid points at the same time.
This makes this problem nice and parallelizable.
Okay, that's clear.
Good.
And, oh yes, right, that's the arithmetic intensity.
I can get it out of the description of my algorithm
and assume that the access here costs me one bar.
They are already there, so to speak, they are already in cache.
This is usually not the case, in general it will be a little worse
because I have to get them out of the memory first.
And that is the operational intensity.
That means, if I look at the operational intensity, then it's 20 bytes per access here,
that won't work, because they have to be made from the memory first.
So it will be a little less.
Not 20 bytes per access, but a little less, because I have the latency of the DRAM again.
And this is taken into account and I have to measure it.
I can't determine it without looking at my algorithm.
So that would be the operational intensity and this is exactly what you can see here.
So, here I am again.
No, this is not to be seen, but here later on in the other graph,
exactly here is the arithmetic intensity to be seen.
So, yes, this example here, let's sum it up,
it's taken from the book by Patterson, Computer Organization and Design.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:20:43 Min
Aufnahmedatum
2017-12-18
Hochgeladen am
2019-05-01 10:39:03
Sprache
de-DE
-
Organisationsaspekte von CISC und RISC-Prozessoren
-
Behandlung von Hazards in Pipelines
-
Fortgeschrittene Techniken der dynamischen Sprungvorhersage
-
Fortgeschritten Cachetechniken, Cache-Kohärenz
-
Ausnutzen von Cacheeffekten
-
Architekturen von Digitalen Signalprozessoren
-
Architekturen homogener und heterogener Multikern-Prozessoren (Intel Corei7, Nvidia GPUs, Cell BE)
-
Architektur von Parallelrechnern (Clusterrechner, Superrechner)
-
Effiziente Hardware-nahe Programmierung von Mulitkern-Prozessoren (OpenMP, SSE, CUDA, OpenCL)
-
Leistungsmodellierung und -analyse von Multikern-Prozessoren (Roofline-Modell)
- Patterson/Hennessy: Computer Organization und Design
-
Hennessy/Patterson: Computer Architecture - A Quantitative Approach
-
Stallings: Computer Organization and Architecture
-
Märtin: Rechnerarchitekturen